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DEC's business and technology objectives require a strong research program. 
The Systems Research Center (SRC) and three other research laboratories 
are committed to filling that need. 

SRC began recruiting its first research scientists in 1984 — their charter, to 
advance the state of knowledge in all aspects of computer systems research. 
Our current work includes exploring high-performance personal computing, 
distributed computing, programming environments, system modelling tech- 
niques, specification technology, and tightly- coupled multiprocessors. 

Our approach to both hardware and software research is to create and use 
real systems so that we can investigate their properties fully. Complex 
systems cannot be evaluated solely in the abstract. Based on this belief, 
our strategy is to demonstrate the technical and practical feasibility of our 
ideas by building prototypes and using them as daily tools. The experience 
we gain is useful in the short term in enabling us to refine our designs, and 
invaluable in the long term in helping us to advance the state of knowledge 
about those systems. Most of the major advances in information systems 
have come through this strategy, including time-sharing, the ArpaNet, and 
distributed personal computing. 

SRC also performs work of a more mathematical flavor which complements 
our systems research. Some of this work is in established fields of theoretical 
computer science, such as the analysis of algorithms, computational geome- 
try, and logics of programming. The rest of this work explores new ground 
motivated by problems that arise in our systems research. 

DEC has a strong commitment to communicating the results and experience 
gained through pursuing these activities. The Company values the improved 
understanding that comes with exposing and testing our ideas within the 
research community. SRC will therefore report results in conferences, in 
professional journals, and in our research report series. We will seek users 
for our prototype systems among those with whom we have common research 
interests, and we will encourage collaboration with university researchers. 
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Author's Abstract 



A multiprocess program executing on a modern multiprocessor must issue 
explicit commands to synchronize memory accesses. A method is proposed 
for deriving the necessary commands from a correctness proof of the algo- 
rithm. 



Capsule Review 

Recently, a number of mechanisms for interprocess synchronization have 
been proposed. As engineers attempt to implement multiprocessors of in- 
creasing scale and performance, these mechanisms have become quite com- 
plex and difficult to reason about. 

This short paper presents a formalism based only on two ordering relations 
between the events of an algorithm, "precedes" and "can affect". It allows 
the mechanisms that must be provided to ensure the algorithm's correctness 
to be determined directly from the correctness proof. The formalism and 
its application to an example mutual exclusion algorithm are presented and 
discussed. 

Although the paper is quite terse, a careful reading will reward those inter- 
ested in concurrency or multiprocessor design. 

Chuck Thacker 
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1 The Problem 



Accessing a single memory location in a multiprocessor is traditionally as- 
sumed to be atomic. Such atomicity is a fiction; a memory access consists of 
a number of hardware actions, and different accesses may be executed con- 
currently. Early multiprocessors maintained this fiction, but more modern 
ones usually do not. Instead, they provide special commands with which 
processes themselves can synchronize memory accesses. The programmer 
must determine, for each particular computer, what synchronization com- 
mands are needed to make his program correct. 

One proposed method for achieving the necessary synchronization is with a 
constrained style of programming specific to a particular type of multipro- 
cessor architecture [7, 8]. Another method is to reason about the program in 
a mathematical abstraction of the architecture [5]. We take a different ap- 
proach and derive the synchronization commands from a proof of correctness 
of the algorithm. 

The commonly used formalisms for describing multiprocess programs as- 
sume atomicity of memory accesses. When an assumption is built into a 
formalism, it is difficult to discover from a proof where the assumption is ac- 
tually needed. Proofs based on these formalisms, including invariance proofs 
[4, 16] and temporal-logic proofs [17], therefore seem incapable of yielding 
the necessary synchronization requirements. We derive these requirements 
from proofs based on a little-used formalism that makes no atomicity as- 
sumptions [11, 12, 14]. This proof method is quite general and has been 
applied to a number of algorithms. The method of extracting synchroniza- 
tion commands from a proof is described by an example — a simple mutual 
exclusion algorithm. It can be applied to the proof of any algorithm. 

Most programs are written in higher-level languages that provide abstrac- 
tions, such as locks for shared data, that free the programmer from concerns 
about the memory architecture. The compiler generates synchronization 
commands to implement the abstractions. However, some algorithms — 
especially within the operating system — require more efficient implemen- 
tations than can be achieved with high-level language abstractions. It is to 
these algorithms, as well as to algorithms for implementing the higher-level 
abstractions, that our method is directed. 
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2 The Formalism 



An execution of a program is represented by a collection of operation execu- 
tions with the two relations — ► (read precedes) and (read can affect). An 
operation execution can be interpreted as a nonempty set of events, where 
the relations — ► and have the following meanings. 

A — ► B: every event in A precedes every event in B. 
A B: some event in A precedes some event in B. 

However, this interpretation serves only to aid our understanding. Formally, 
we just assume that the following axioms hold, for any operation executions 
A, B, C, and D. 

Al. — ► is transitive (A — ► B — ► C implies A — ► C) and irreflexive 
(A -f* A). 

A2. A — * B implies A -* B and B -f+ A. 

A3. A B C or A -» B — C implies A -» C. 

A4. A — - B C — - D implies A — - D. 

A5. For any A there are only a finite number of B such that A -/-*■ B. 

The last axiom essentiaUy asserts that all operation executions terminate; 
nonterminating operations satisfy a different axiom that is not relevant here. 
Axiom A5 is useful only for proving liveness properties; safety properties are 
proved with Axioms A1-A4. properties. Anger [3] and Abraham and Ben- 
David [1] introduced the additional axiom 

A6. A —+ B —+ C -* D implies A -* D. 

and showed that A1-A6 form a complete axiom system for the interpretation 
based on operation executions as sets of events. 

Axioms A1-A6 are independent of what the operation executions do. Rea- 
soning about a multiprocess program requires additional axioms to capture 
the semantics of its operations. The appropriate axioms for read and write 
operations will depend on the nature of the memory system. 
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The only assumptions we make about operation executions are axioms Al- 
A5 and axioms about read and write operations. We do not assume that 
— ► and are the relations obtained by interpreting an operation execu- 
tions as the set of all its events. For example, sequential consistency [10] is 
equivalent to the condition that — ► is a total ordering on the set of oper- 
ation executions — a condition that can be satisfied even though the events 
comprising different operation executions are actually concurrent. 

This formalism was developed in an attempt to provide elegant proofs of 
concurrent algorithms — proofs that replace conventional behavioral argu- 
ments with axiomatic reasoning in terms of the two relations — ► and 
Although the simplicity of such proofs has been questioned [6], they do tend 
to capture the essence of why an algorithm works. 

3 An Example 

3.1 An Algorithm and its Proof 

Figure 1 shows process i of a simple A-process mutual exclusion algo- 
rithm [13]. We prove that the algorithm guarantees mutual exclusion (two 
processes are never concurrently in their critical sections). The algorithm is 
also deadlock-free (some critical section is eventually executed unless all pro- 
cesses halt in their noncritical sections), but we do not consider this liveness 
property. Starvation of individual processes is possible. 

The algorithm uses a standard protocol to achieve mutual exclusion. Before 
entering its critical section, each process i must first set X{ true and then find 
Xj false, for all other processes j. Mutual exclusion is guaranteed because, 
when process i finds Xj false, process j cannot enter its critical section until it 
sets Xj true and find X{ false, which is impossible until i has exited the critical 
section and reset X{. The proof of correctness formalizes this argument. 

To prove mutual exclusion, we first name the following operation executions 
that occur during the ra th iteration of process i's repeat loop. 

Lf The last execution of statement / prior to entering the critical section. 
This operation execution sets X{ to true. 

Rfj The last read of Xj before entering the critical section. This read 
obtains the value false. 
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repeat forever 

noncritical section] 
I: Xi := true] 

for j := 1 until i — 1 

do if Xj then X{ := false; 

while Xj do od; 
goto / fi od; 

for j ' : = i + 1 until N do while Xj do od od; 
critical section; 
Xi := /a/se 
end repeat 

Figure 1: Process i of an iV-process mutual-exclusion algorithm. 
CSf The execution of the critical section. 

Xf The write to X{ after exiting the critical section. It writes the value 
false. 

Mutual exclusion asserts that CSf and CSJ 1 are not concurrent, for all to 
and n, if i ^ j. 1 Two operations are nonconcurrent if one precedes ( — ») 
the other. Thus, mutual exclusion is implied by the assertion that, for all 
to and n, either C5? C5f or — CS?, if i 7^ j. 

The proof of mutual exclusion, using axioms A1-A4 and assumptions Bl- 
B4 below, appears in Figure 2. It is essentially the same proof as in [13], 
except that the properties required of the memory system have been iso- 
lated and named B1-B4. (In [13], these properties are deduced from other 
assumptions.) 

B1-B4 are as follows, where universal quantification over n, to, i, and j is 
assumed. B4 is discussed below. 

Bl. Rfj 
B2. Rl- — GS7 
B3. CSf — X? 

1 Except where indicated otherwise, all assertions have as an unstated hypothesis the 
assumption that the operation executions they mention actually occur. For example, the 
theorem in Figure 2 has the hypothesis that OS™ and CSJ 1 occur. 
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Theorem For all m, n, i, and j such that i / j, either (757 — - OS™ or 

3 % 

Case A: i^- --if. 

1. Lf -J 

Proof: Bl , case assumption, Bl (applied to if and i?™), and A4. 

2. X? 
Proof: 1 and A2. 

3. A t K hl 

Proof: 2 and B4 (applied to i?™-, if, and Xf ). 

4. C5? — GS*f 

Proof: B3, 3, B2 (applied to i?™- and C5*f ), and A4. 

Case B: i^- -/» Lf . 
1. Xf i^- 

Proof: Case assumption and B4. 

Proo/: B3 (applied to CSf and XJ 1 ), 1, B2, and A4. 
Figure 2: Proof of mutual exclusion for the algorithm of Figure 1. 



B4. If Kf- -U then X" 1 exists and X" 1 -» P? ,-. 

Although B4 cannot be proved without additional assumptions, it merits an 
informal justification. The hypothesis, Rfj -/*- L'J , asserts that process i's 
read Rfj of Xj occurred too late for any of its events to have preceded any 
of the events in process j's write L'J of Xj. It is reasonable to infer that the 
value obtained by the read was written by L'J or a later write to Xj. Since 
Lf 1 writes true and Rf ■ is a read of false, Rf- must read the value written 
by a later write. The first write of Xj issued after L'J is XJ 1 , so we expect 
Xf -» Rf,- to hold. 



3.2 The Implementation 

Implementing the algorithm for a particular memory architecture may re- 
quire synchronization commands to assure B1-B4. Most proposed memory 
systems satisfy the following property. 

CI. All write operations to a single memory cell by any one process are 
observed by other processes in the order in which they were issued. 
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They also provide some form of synch command (for example, a "cache 
flush" operation) satisfying 

C2. A synch command causes the issuing process to wait until all previ- 
ously issued memory accesses have completed. 

Properties CI and C2 are rather informal. We restate them more precisely 
as follows. 

CI'. If the value obtained by a read A issued by process i is the one written 
by process j, then that value is the one written by the last-issued write 
B in process j such that B A. 

C2'. If operation executions A, B, and C are issued in that order by a single 
process, and B is a synch, then A — ► C . 

Property C2' implies that B1-B3 are guaranteed if synch operations are 
inserted in process i's code immediately after statement / (for Bl), immedi- 
ately before the critical section (for B2), and immediately after the critical 
section (for B3). Assumption B4 follows from CI'. 

Now let us consider a more specialized memory architecture in which each 
process has its own cache, and a write operation (asynchronously) updates 
every copy of the memory cell that resides in the caches. In such an archi- 
tecture, the following additional condition is likely to hold: 

C3. A read of a memory cell that resides in the process's cache precedes 
( — ») every operation execution issued subsequently by the same pro- 
cess. 

If the memory system provides some way of ensuring that a memory cell 
is permanently resident in a process's cache, then B2 can be satisfied by 
keeping all the variables Xj in process i's cache. In this case, the synch 
immediately preceding the critical section is not needed. 

3.3 Observations 

One might think that the purpose of memory synchronization commands is 
to enforce orderings between commands issued by different processes. How- 
ever, B1-B3 are precedence relations between operations issued by the same 
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process. In general, one process cannot directly observe all the events in the 
execution of an operation by another process. Hence, the results of execut- 
ing two operation executions A and D in different processes can permit the 
deduction only of a causality (--*-) relation between A and D. Only if A and 
D occur in the same process can A — ► D be deduced by direct observation. 
Otherwise, deducing A — ► D requires the existence of an operation B in 
the same process as A and an operation C in the same process as D such 
that A — ► B --*■ C — ► D. Synchronization commands can guarantee the 
relations A — ► B and C — ► D. 

The mutual exclusion example illustrates how a set of properties sufficient 
to guarantee correctness can be extracted directly from a correctness proof 
of the algorithm. Implementations of the algorithm on different memory 
architectures can be derived from the assumptions, with no further reasoning 
about the algorithm. 

4 Further Remarks 

The atomicity condition traditionally assumed for multiprocess programs is 
sequential consistency, meaning that the program behaves as if the memory 
accesses of all processes were interleaved and then executed sequentially [10]. 
It has been proposed that, when sequential consistency is not provided by 
the memory system, it be achieved by a constrained style of programming. 
Synchronization commands are added either explicitly by the programmer, 
or automatically from hints he provides. The method of [7, 8] can be applied 
to our simple example, if the X{ are identified by the programmer as syn- 
chronization variables. However, in general, deducing what synchronization 
commands are necessary requires analyzing all possible executions of the 
program, which is seldom feasible. Such an analysis is needed to find the 
precedence relations that, in the approach described here, are derived from 
the proof. 

Although it replaces traditional informal reasoning with a more rigorous, ax- 
iomatic style, the proof method we have used is essentially behavioral — one 
reasons directly about the set of operation executions. Behavioral meth- 
ods do not seem to scale well, and our approach is unlikely to be practical 
for large, complicated algorithms. Most multiprocess programs for modern 
multiprocessors are best written in terms of higher-level abstractions. The 
method presented here can be applied to the algorithms that implement 
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these abstractions and to those algorithms, usually in the depths of the 
operating system, where efficiency and correctness are crucial. 

Assertional proofs are practical for more complicated algorithms. The obvi- 
ous way to reason assertionally about algorithms with nonatomic memory 
operations is to represent a memory access by a sequence of atomic oper- 
ations [2, 9]. With this approach, the memory architecture and synchro- 
nization operations are encoded in the algorithm. Therefore, a new proof 
is needed for each architecture, and the proofs are unlikely to help discover 
what synchronization operations are needed. A less obvious approach uses 
the predicate transformers win (weakest invariant) and sin (strongest invari- 
ant) to write assertional proofs for algorithms in which no atomic operations 
are assumed, requirements on the memory architecture being described by 
axioms [15]. Such a proof would establish the correctness of an algorithm 
for a large class of memory architectures. However, in this approach, all 
intraprocess — ► relations are encoded in the algorithm, so the proofs are 
unlikely to help discover the very precedence relations that lead to the in- 
troduction of synchronization operations. 
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